Skip to content

mintmaker: set GOMEMLIMIT to avoid controller OOMKilled crashes #7576

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged

Conversation

qixiang
Copy link
Contributor

@qixiang qixiang commented Aug 11, 2025

Set GOMEMLIMIT=7GiB so the Go runtime respects Kubernetes memory limits. Although the controller is allocated 8GiB memory, in large clusters (e.g., p02 with 5000+ components) it can still be OOMKilled, possibly due to the Go runtime ignoring container limits.

Set GOMEMLIMIT=7GiB so the Go runtime respects Kubernetes memory limits.
Although the controller is allocated 8GiB memory, in large clusters (e.g.,
p02 with 5000+ components) it can still be OOMKilled, possibly due to
the Go runtime ignoring container limits.
@openshift-ci openshift-ci bot requested review from FernandesMF and querti August 11, 2025 08:53
Copy link
Contributor

Code Review by Gemini

## Code Review

### `components/mintmaker/production/base/manager_patch.yaml`

**Improvement:**

The addition of `GOMEMLIMIT` is a good practice for Go applications running in containers to help the Go runtime better respect the allocated memory limits and prevent OOMKilled issues. The value of `7GiB` for an `8GiB` limit provides a reasonable buffer for non-Go memory usage.

```diff
--- a/components/mintmaker/production/base/manager_patch.yaml
+++ b/components/mintmaker/production/base/manager_patch.yaml
@@ -8,10 +8,13 @@ spec:
     spec:
       containers:
       - name: manager
         resources:
           limits:
             cpu: 500m
             memory: 8Gi
           requests:
             cpu: 100m
             memory: 8Gi
+        env:
+        - name: GOMEMLIMIT
+          value: "7GiB"

components/mintmaker/production/kflux-ocp-p01/kustomization.yaml

components/mintmaker/production/kflux-osp-p01/kustomization.yaml

components/mintmaker/production/kflux-prd-rh02/kustomization.yaml

components/mintmaker/production/kflux-prd-rh03/kustomization.yaml

components/mintmaker/production/kflux-rhel-p01/kustomization.yaml

components/mintmaker/production/stone-prod-p01/kustomization.yaml

Improvement:

Adding explicit image pinning for mintmaker in these production Kustomization files is a good practice for ensuring reproducibility and stability of deployments. Using a specific commit SHA as the newTag is robust.

--- a/components/mintmaker/production/kflux-ocp-p01/kustomization.yaml
+++ b/components/mintmaker/production/kflux-ocp-p01/kustomization.yaml
@@ -1,15 +1,21 @@
 apiVersion: kustomize.config.k8s.io/v1beta1
 kind: Kustomization
 resources:
 - ../base
 namespace: mintmaker
+
+images:
+  - name: quay.io/konflux-ci/mintmaker
+    newName: quay.io/konflux-ci/mintmaker
+    newTag: a3816e76f5e06ba3114731fc34fa14f76d442b6c
+
 patches:
   - path: pipelines-as-code-secret-path.yaml
     target:
       name: pipelines-as-code-secret
       group: external-secrets.io
       version: v1beta1
       kind: ExternalSecret
 
 components:
   - ../../components/rh-certs

(The changes for kflux-osp-p01, kflux-prd-rh02, kflux-prd-rh03, kflux-rhel-p01, stone-prod-p01 are identical to the above diff for kflux-ocp-p01.)

components/multi-platform-controller/base/kustomization.yaml

No issues:

This change updates the ref for remote Kustomize bases and the newTag for the multi-platform-controller and multi-platform-otp-server images. This is a standard version update and appears correct.

--- a/components/multi-platform-controller/base/kustomization.yaml
+++ b/components/multi-platform-controller/base/kustomization.yaml
@@ -1,19 +1,19 @@
 apiVersion: kustomize.config.k8s.io/v1beta1
 kind: Kustomization
 
 namespace: multi-platform-controller
 
 resources:
 - common
 - rbac
-- https://github.com/konflux-ci/multi-platform-controller/deploy/operator?ref=944fdc5bfb70a9e0ea84ea1232fc3ed2b8d924ba
-- https://github.com/konflux-ci/multi-platform-controller/deploy/otp?ref=944fdc5bfb70a9e0ea84ea1232fc3ed2b8d924ba
+- https://github.com/konflux-ci/multi-platform-controller/deploy/operator?ref=9f9f72459e22350ec61f1f0e7fce31ee3be53670
+- https://github.com/konflux-ci/multi-platform-controller/deploy/otp?ref=9f9f72459e22350ec61f1f0e7fce31ee3be53670
 
 
 images:
 - name: multi-platform-controller
   newName: quay.io/konflux-ci/multi-platform-controller
-  newTag: 944fdc5bfb70a9e0ea84ea1232fc3ed2b8d924ba
+  newTag: 9f9f72459e22350ec61f1f0e7fce31ee3be53670
 - name: multi-platform-otp-server
   newName: quay.io/konflux-ci/multi-platform-controller-otp-service
-  newTag: 944fdc5bfb70a9e0ea84ea1232fc3ed2b8d924ba
+  newTag: 9f9f72459e22350ec61f1f0e7fce31ee3be53670

@qixiang
Copy link
Contributor Author

qixiang commented Aug 11, 2025

Since earlier today, the mintmaker controller in the p02 cluster has been repeatedly crashing due to OOMKilled errors. I’m not certain of the exact root cause, but it appears to be related to the Go runtime not respecting container memory limits. I’m setting GOMEMLIMIT to check whether this helps.

@staticf0x
Copy link
Contributor

do you think we could log the memory in controller via https://pkg.go.dev/runtime/metrics ?

@qixiang
Copy link
Contributor Author

qixiang commented Aug 11, 2025

do you think we could log the memory in controller via https://pkg.go.dev/runtime/metrics ?

I’m currently investigating and analyzing the controller’s memory usage, and I will include this in the research plan

@staticf0x
Copy link
Contributor

do you think we could log the memory in controller via https://pkg.go.dev/runtime/metrics ?

I’m currently investigating and analyzing the controller’s memory usage, and I will include this in the research plan

Awesome, thank you, let's merge this in the meantime

@staticf0x
Copy link
Contributor

/approve
/lgtm

Copy link

openshift-ci bot commented Aug 11, 2025

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: qixiang, staticf0x

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@openshift-merge-bot openshift-merge-bot bot merged commit 8a0e051 into redhat-appstudio:main Aug 11, 2025
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants